[Day11] - Context：pl.DataFrame.filter() - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2025 iThome 鐵人賽

DAY 11

Software Development

Polars熊霸天下系列第 11 篇

[Day11] - Context：pl.DataFrame.filter()

17th鐵人賽 python polars

Jerry Wu

2025-09-17 00:03:34

102 瀏覽

分享至

今天我們來學習如何使用pl.DataFrame.filter()。

本日大綱如下：

本日引入模組及準備工作
pl.DataFrame.filter()
codepanda

0. 本日引入模組及準備工作

import polars as pl

data = {"col1": [1, 2, 3], "col2": ["x", "y", "z"]}
df = pl.DataFrame(data)

shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
│ 2    ┆ y    │
│ 3    ┆ z    │
└──────┴──────┘

1. `pl.DataFrame.filter()`

pl.DataFrame.filter()可以幫助我們根據所給條件篩選出需要的行，例如想選擇「"col1"」列中小於2的行（註1）：

df.filter(pl.col("col1").lt(2))

shape: (1, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
└──────┴──────┘

這邊可以觀察出兩點：

篩選後的行數為1，小於原先的行數3。
篩選後的列數為2，等於原先的列數2。

pl.DataFrame.filter()內的expr必須是一個能返回布林結果（True或False）的expr。如果傳入一個不會返回布林結果的expr，Polars會報錯如下：

❌
# InvalidOperationError 
df.filter(pl.col("col1"))

此外需留意，當每一行的布林結果都為False時，代表沒有符合條件的行，會返回一個空的dataframe，例如選擇「"col1"」列中大於10的行：

df.filter(pl.col("col1").gt(10))

shape: (0, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
└──────┴──────┘

如果想要選擇「"col2"」列中不等於「"y"」的行，可以使用~符號，例如：

df.filter(~pl.col("col2").eq("y"))

shape: (2, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
│ 3    ┆ z    │
└──────┴──────┘

如果想選出「"col1"」列中小於2或者「"col2"」列中等於「"y"」的行，可以使用|符號，例如：

df.filter(pl.col("col1").lt(2) | pl.col("col2").eq("y"))

shape: (2, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
│ 2    ┆ y    │
└──────┴──────┘

如果想選出「"col1"」列中小於2而且「"col2"」列中等於「"x"」的行，可以使用&符號，例如：

df.filter(pl.col("col1").lt(2) & pl.col("col2").eq("x"))

shape: (1, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
└──────┴──────┘

另外值得一提的是，當有多個等於的判斷情況，且各等於情況間的關係「且」時，可以將列名以關鍵字傳入。例如想選出「"col1"」列中等於1而且「"col2"」列中等於「"x"」的行，可以這麼寫：

df.filter(col1=1, col2="x")

shape: (1, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
└──────┴──────┘

再次強調，這樣的寫法很挶限，需要符合下面三個條件才能使用：

僅限於判斷等於情況。
各個等於情況間的關係為「且」。
全部列名需符合Python命名原則。

且還必須確定自己或同事日後看到時，能記得這三個原則，因此我不太推薦這樣的寫法。

2. `codepanda`

Pandas中相對應於polars的pl.DataFrame.filter()中的函數是pd.DataFrame.query()。

在pd.DataFrame.query()中可以直接使用欄位名稱來取得該欄位，並進行計算。例如想選出「"col1"」列中小於2或者「"col2"」列中等於「"y"」的行，可以使用這麼寫：

df_pd = pd.DataFrame(data)

df_pd.query("col1 < 2 | col2 == 'y'")

   col1 col2
0     1    x
1     2    y

如果是需要引用到環境變數的話，可以使用@符號，例如上面的query可以改寫為：

col1_target, col2_target = 2, "y"
df_pd.query("col1 < @col1_target | col2 == @col2_target")

   col1 col2
0     1    x
1     2    y

備註

註1：在Python中，object底層進行運算的基礎，是基於各自的dunder method，例如__lt__()。這讓我們可以使用熟悉的符號，像是以「"<"」來呼叫__lt__()。Polars內部實作了大部份常用的dunder method，這個技巧稱之為operator overload。舉例來說，當想要選擇「"col1"」列中小於2的行，可以這麼寫：

df.filter(pl.col("col1") < 2)

shape: (1, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
└──────┴──────┘

但是從源碼可以看出，Polars也實作了相對應的「捷徑」。例如lt()最終會呼叫__lt__()：

# polars/py-polars/polars/expr/expr.py

class Expr:
    ...
    def lt(self, other: Any) -> Expr:
        return self.__lt__(other)

所以選擇「"col1"」列中小於2的行，也可以這麼寫：

df.filter(pl.col("col1").lt(2))

shape: (1, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ x    │
└──────┴──────┘

使用者可以根據自己的習慣選擇寫法。

Code

本日程式碼傳送門。

[Day10] - Context：pl.DataFrame.select()與pl.DataFrame.with_columns()

[Day12] - Context：pl.DataFrame.group_by()

系列文

Polars熊霸天下共 30 篇

RSS系列文訂閱系列文

1 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19848 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

Polars熊霸天下系列 第 11 篇